Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept payee prediction #239

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ryansouza
Copy link

@ryansouza ryansouza commented Sep 19, 2024

My credit card imports produce narrations that aren't super useful/recognizable as actual companies. Especially with square and other payment providers reducing the amount of characters businesses seem to have available, and merchants like Amazon with tons of random seeming permutations of names. Ideally I'd like to apply nice readable names to my transactions for review and later reporting, I think that would be nicer than a sub-account in expenses for ever single merchant.

Does this seem like a useful path to follow? I'm light on python skills but this branch is just barely enough code to work as a proof of concept and no ui affordances.

Screenshot 2024-09-18 at 6 02 30 PM

@Zburatorul
Copy link
Collaborator

It's worth noting that for Amazon in particular there is a special downloader in finance_dl (and corresponding importer in this repo) which would automatically match the credit card transaction and impute Amazon.com as a payee.
The general case of merchants without another source is tricky. Predicting payee is harder than predicting account because 1) there are more of them, 2) if you buy from new places you'll often find yourself out of sample, with new payees that had not appeared before and thus have no training data, 3) it's harder to generalize to a new payee.
An example of this last point is the following:
If you have a training example with narration "Starbucks Coffee XA5757" and account Expenses:Coffee one can see how given a never before seen narration "Local Coffee Shop A97897ADD" the model could generalize and correctly predict Expenses:Coffee. But inferring that the payee is "Local Coffee Shop" on a never before seen example is... either a generative task, or an entity recognition task.

Another approach is to write a standalone script that reads the journal and applies static rules that you would maintain manually. I would advise you try this first because it's not hard to do this from scratch, and may give you insights into solving the problem with ML.

@ryansouza
Copy link
Author

That makes sense. Initially I did try using the beancount example ofx and generic importer source to try the https://github.com/beancount/smart_importer predictor but that didn’t seem to work and also broke all the matching from beancount_import.

I will think a bit on where payee handling might fit. Seems like I could wrap beancount_import.source.ofx and adjust things in a separate SourceResult 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants